Constructing Reference Sets from Unstructured, Ungrammatical Text
نویسندگان
چکیده
منابع مشابه
Constructing Reference Sets from Unstructured, Ungrammatical Text
Vast amounts of text on the Web are unstructured and ungrammatical, such as classified ads, auction listings, forum postings, etc. We call such text “posts.” Despite their inconsistent structure and lack of grammar, posts are full of useful information. This paper presents work on semi-automatically building tables of relational information, called “reference sets,” by analyzing such posts dire...
متن کاملSemantic annotation of unstructured and ungrammatical text
The Semantic Web will revolutionize the use of the internet, but the idea faces some major challenges. First, construction of the Semantic Web requires a lot of extra markup on documents, but this work should not be forced upon everyday users. Second, there is a lot of information that would be more useful if it were marked up for the Semantic Web, but the nature of the data makes it difficult ...
متن کاملBeginning to Understand Unstructured, Ungrammatical Text: An Information Integration Approach
As information agents become pervasive, they will need to read and understand the vast amount of information on the World Wide Web. One such valuable source of information is unstructured and ungrammatical text that appears in data sources such as online auctions or internet classifieds. One way to begin to understand this text is to figure out the entities that the text references. This can be...
متن کاملA Reference-set Approach to Information Extraction from Unstructured, Ungrammatical Data Sources
This thesis investigates information extraction from unstructured, ungrammatical text on the Web such as classified ads, auction listings, and forum postings. Since the data is unstructured and ungrammatical, this information extraction precludes the use of rule-based methods that rely on consistent structures within the text or natural language processing techniques that rely on grammar. Inste...
متن کاملCreating Relational Data from Unstructured and Ungrammatical Data Sources
In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the Web is neither grammatical nor formally structured, making querying difficult. Examples of these types of data sources are online classifieds like Craigslist and auction item listings like eBay. We call this unstructu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2010
ISSN: 1076-9757
DOI: 10.1613/jair.2937